Filesystem only snapshots#2995
Conversation
PR SummaryMedium Risk Overview Reviewed by Cursor Bugbot for commit 6a95194. Bugbot is set up for automated code reviews on this repo. Configure here. |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Code Review
No critical findings or correctness issues were identified in the code changes, and no review comments were provided. I have no feedback to provide.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
f669bc7 to
c609e2c
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c609e2c116
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| RootfsDroppedBuilds: uint32(rootDropped), | ||
| } | ||
|
|
||
| if memfileHeader != nil && memfileHeader.Metadata != nil { |
There was a problem hiding this comment.
Preserve rootfs metadata after fs-only reload
This nil-memfile path only works while Pause still has the rootfs header in hand. Once a filesystem-only snapshot is uploaded and later reloaded, there is no memfile header/object, but storageTemplate.SchedulingMetadata still calls t.memfile.WaitWithContext and returns nil on that error before reading rootfs (packages/orchestrator/pkg/sandbox/template/storage_template.go:278-282). In contexts that report scheduling metadata from a loaded template, filesystem-only builds therefore lose even the rootfs affinity data; update that provider to tolerate the missing memfile and call FromHeaders with a nil memfile header and the resolved rootfs header.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed! storageTemplate.SchedulingMetadata now reports rootfs-only scheduling metadata for reloaded fs-only templates instead of dropping it.
c609e2c to
b2273c7
Compare
b2273c7 to
d2c38fc
Compare
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit d2c38fc. Configure here.
|
|
||
| const ( | ||
| // minEnvdVersionForKVMClock is the minimum envd version that supports kvm-clock. | ||
| minEnvdVersionForKVMClock = "0.2.11" |
There was a problem hiding this comment.
Nit. It appears that this is duplicated with create_sandbox.go. Not that it's likely to be updated, but could be extracted into a single const.
| func (s *Sandbox) guestSync(ctx context.Context) error { | ||
| syncTimeout := s.guestSyncTimeout(ctx) | ||
|
|
||
| ctx, span := tracer.Start(ctx, "envd-guest-sync") |
There was a problem hiding this comment.
would we like a metric here as well to tell how long it usually takes?
There was a problem hiding this comment.
In general, do we have enough data to separate the regular pause from fs-only in our metrics?
There was a problem hiding this comment.
Added a metric for guest sync latency and span attribute to separate between full and filesystem-only snapshots.
d2c38fc to
dc9e2ea
Compare
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
Add Sandbox.Pause(WithFilesystemSnapshot()) to produce a snapshot that persists only the rootfs and skips the guest memory snapshot. The default remains a full memory snapshot; fs-only is strictly opt-in. When enabled, before pausing the VM a mandatory guest sync is run via envd (a failed sync fails the pause, since no memory snapshot preserves the page cache and the rootfs would otherwise be missing acknowledged writes), and memory prefetch is cleared. CreateSnapshot is still called for its disk drain+flush side effect, but the memfile diff is left empty (NoDiff) with a resolved-nil header. scheduling.FromHeaders now tolerates a nil memfile header, emitting rootfs-only scheduling metadata instead of dropping the metadata entirely. Snapshot.FilesystemSnapshot records the decision at pause time: it can't be inferred from the diff shape, since a memory snapshot with zero dirty pages also yields a NoDiff memfile but still needs its snapfile uploaded. Add -fs-only flag in cmd/resume-build to allow testing the operation locally. Snapshots taken with -fs-only should include only rootfs (meta)data. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
…ride flag The mandatory pre-pause guest sync used a fixed 5s deadline, which is too short under IO contention — a failed sync fails the (filesystem-only) pause. The dirty page cache a sync must flush is bounded by guest RAM (pages already written back are not re-flushed), so scale the deadline by RamMB against a pessimistic 50 MiB/s flush-throughput floor, clamped to [5s, 2min]. Add the guest-sync-timeout-milliseconds feature flag as an operational override: a positive value pins the deadline regardless of RAM; 0 (default) keeps the RAM-derived value. Unit tests cover the scaling/clamping and the flag override/fallback. Addresses PR review feedback. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
…pshot Export the memorySnapshot helper as MemorySnapshot and embed it in Snapshot as a single field, replacing the flat MemfileDiff / MemfileDiffHeader / MemfileBlockSize fields. The Diff, DiffHeader, and BlockSize fields are exported (read cross-package by the upload, layer, and server paths); the header and newBytes scheduling inputs stay unexported as they are used only at Pause time. Pure rename/restructure: no behavioral change. Consumers updated to read snap.MemorySnapshot.Diff / .DiffHeader / .BlockSize. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
A filesystem-only snapshot now uploads only rootfs.ext4(.header) and metadata.json. The snapfile upload is skipped in both the V3 and V4 paths (it is still created during pause for its disk drain+flush side effect, just not persisted), and NewUpload skips resolving the memfile compress config — which would otherwise fail validation on the zero memfile block size when compression is enabled. The memfile body and header need no change: NoDiff.CachePath returns an empty path (body self-skips) and the resolved-nil diff header makes both upload paths return early. Absence of the memory artifacts on storage is the on-disk signal that a snapshot is filesystem-only. Memory snapshots are unaffected (FilesystemSnapshot is false). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
RebootSandbox cold-boots a fresh Firecracker VM from a snapshot's rootfs without restoring guest memory — the resume primitive for filesystem-only snapshots. It masks an empty memfile (sizing NoopMemory only), selects the NBD provider (empty rootfs cache path) so guest TRIM and overlay chaining work like a normal resume, uses the Sync IO engine, and boots via systemd init. The sandbox is marked running only after envd is ready, matching ResumeSandbox's routing guarantee; the caller's absolute end time is honored so queue delay can't extend the TTL. To support the deferred mark-running, CreateSandbox gains a WithDeferredMarkRunning option (cold boot otherwise marks running before envd is up). A StartTypeReboot start type is added, and the ext4 rootflags are pulled into a named constant documenting that "noload" must never be set — fs-only resume relies on ext4 replaying its journal on cold boot. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
A memory resume calls setMmds with the access-token hash (fc/process.go Resume) so the guest envd can authenticate /init against the MMDS hash. Cold boot (Process.Create) only configured the MMDS transport and never wrote the data, so a rebooted secure sandbox would start envd with no hash and /init would fall into the unauthenticated first-time-setup branch. Process.Create now writes the MMDS metadata before startVM when ProcessOptions.AccessToken is set, mirroring Resume. RebootSandbox always sets it; an empty token hashes to the "no token" value, matching Resume's behavior for non-secure sandboxes. Both the MMDS hash and the /init token sent by WaitForEnvd are sourced from config.Envd.AccessToken, so they always agree. Template-build cold boots leave AccessToken nil and are unaffected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
Add a -reboot flag that cold-boots from the build's rootfs via Factory.RebootSandbox instead of resuming from the memory snapshot, routed through a new runner.startSandbox helper used by the resume, interactive, cmd, and pause paths. Lets the filesystem-only pause + reboot resume cycle be exercised end to end locally (e.g. `resume-build -fs-only -cmd-pause '...'` then `resume-build -reboot -cmd 'cat /proc/uptime'`). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
Add a metric for measuring the latency of syncing the guest filesystem before taking a snapshot. Also, add a span attribute in `sandbox-snapshot` span that shows when a Pause() operation is on behalf of a filesystem-only snapshot. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
dc9e2ea to
6a95194
Compare
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |

What
Adds an opt-in snapshot mode that persists only the filesystem (rootfs) and skips the guest memory snapshot. Resuming such a snapshot cold-boots (reboots) a fresh VM from the rootfs instead of restoring memory.
Sandbox.Pause(WithFilesystemSnapshot())— pauses without a memory snapshot; uploads only the rootfs + metadata (no memfile/snapfile).Factory.RebootSandbox— the resume primitive that cold-boots from a snapshot's rootfs, marked routable only once the guest is ready.cmd/resume-buildgains-fs-onlyand-rebootto exercise the full cycle locally.Normal memory snapshots are unchanged — this mode is strictly opt-in.
Why
Memory snapshots are the expensive part of a pause (large artifacts, slow upload, more storage). Many workloads don't need warm RAM restored on resume — they only need their disk state. A filesystem-only snapshot is cheaper and smaller to take and store, and also serves as an escape hatch when memory artifacts are slow, oversized, or missing. The trade-off is explicit: a reboot loses RAM, running processes, connections, and unsynced writes; only the synced filesystem survives.
Scope / not in this PR
Orchestrator-only. No public API/proto/SDK wiring yet — the reboot path is reachable via
resume-buildbut not yet over gRPC. API-side resume selection and auto-resume/connectgating (so a disk-only snapshot is never silently rebooted) are follow-ups. Guest cold-boot speedups live on a separate branch.Testing
Verified to build/lint clean per commit. End-to-end pause→reboot validation (synced files persist, uptime resets, correct on-storage layout) is done locally via
resume-build; secure-sandbox envd auth after reboot still needs validation against a deployment.